CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis
نویسندگان
چکیده
UNLABELLED With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. SOFTWARE AVAILABILITY A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.
منابع مشابه
Identification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis
Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...
متن کاملApplication of Gene Expression Programming and Support Vector Regression models to Modeling and Prediction Monthly precipitation
Estimating and predicting precipitation and achieving its runoff play an important role to correct management and exploitation of basins, management of dams and reservoirs, minimizing the flood damages and droughts, and water resource management, so they are considered by hydrologists. The appropriate performance of intelligent models leads researchers to use them for predicting hydrological ph...
متن کاملPrimary root growth, tissue expression and co-expression analysis of a receptor kinase mutant in Arabidopsis
There is no functional annotation for the majority of the several hundreds of receptor-like kinases in plants. A direct way of inferring the function of these proteins is to study the phenotype that results from loss of function mutants such as T-DNA mutant lines. In this research a function (phenotype) to At2g37050 gene that encodes a receptor like kinase in Arabidopsis T-DNA line was...
متن کاملEvaluation of the Prognostic Value and TRIP13 gene Expression in Gastric Cancer
Introduction: Gastric cancer is a major public health issue worldwide. The factors that initiate cancer are not well understood, however aberrant expression of genes is associated with this cancer. TRIP13 plays pivotal roles in meiotic recombination, DNA repair, and cell cycle progression. An increasing body of evidence suggests that TRIP13 may possess functions other than meiosis and mitosis, ...
متن کاملشبیهسازی مکانی- زمانی بارش سالانه با استفاده از مدلهای تصادفی
Precipitation is one of the most important components of water balance in any region and the development of efficient models for estimating its spatiotemporal distribution is of considerable importance. The goal of the present research was to investigate the efficiency of the first order multiple-site auto regressive model in the estimation of spatiotemporal precipitation in Kurdistan, Iran. Fo...
متن کامل